22 research outputs found

    Online Domain Adaptation for Multi-Object Tracking

    Full text link
    Automatically detecting, labeling, and tracking objects in videos depends first and foremost on accurate category-level object detectors. These might, however, not always be available in practice, as acquiring high-quality large scale labeled training datasets is either too costly or impractical for all possible real-world application scenarios. A scalable solution consists in re-using object detectors pre-trained on generic datasets. This work is the first to investigate the problem of on-line domain adaptation of object detectors for causal multi-object tracking (MOT). We propose to alleviate the dataset bias by adapting detectors from category to instances, and back: (i) we jointly learn all target models by adapting them from the pre-trained one, and (ii) we also adapt the pre-trained model on-line. We introduce an on-line multi-task learning algorithm to efficiently share parameters and reduce drift, while gradually improving recall. Our approach is applicable to any linear object detector, and we evaluate both cheap "mini-Fisher Vectors" and expensive "off-the-shelf" ConvNet features. We quantitatively measure the benefit of our domain adaptation strategy on the KITTI tracking benchmark and on a new dataset (PASCAL-to-KITTI) we introduce to study the domain mismatch problem in MOT.Comment: To appear at BMVC 201

    Combination Schemes for Turning Point Predictions

    Get PDF
    We propose new forecast combination schemes for predicting turning points of business cycles. The combination schemes deal with the forecasting performance of a given set of models and possibly providing better turning point predictions. We consider turning point predictions generated by autoregressive (AR) and Markov-Switching AR models, which are commonly used for business cycle analysis. In order to account for parameter uncertainty we consider a Bayesian approach to both estimation and prediction and compare, in terms of statistical accuracy, the individual models and the combined turning point predictions for the United States and Euro area business cycles

    Segment-and-count: Vehicle Counting in Aerial Imagery using Atrous Convolutional Neural Networks

    Get PDF
    High-resolution aerial imagery can provide detailed and in some cases even real-time information about traffic related objects. Vehicle localization and counting using aerial imagery play an important role in a broad range of applications. Recently, convolutional neural networks (CNNs) with atrous convolution layers have shown better performance for semantic segmentation compared to conventional convolutional aproaches. In this work, we propose a joint vehicle segmentation and counting method based on atrous convolutional layers. This method uses a multi-task loss function to simultaneously reduce pixel-wise segmentation and vehicle counting errors. In addition, the rectangular shapes of vehicle segmentations are refined using morphological operations. In order to evaluate the proposed methodology, we apply it to the public "DLR 3K" benchmark dataset which contains aerial images with a ground sampling distance of 13 cm. Results show that our proposed method reaches 81.58% mean intersection over union in vehicle segmentation and shows an accuracy of 91.12% in vehicle counting, outperforming the baselines

    End-to-End Saliency Mapping via Probability Distribution Prediction

    Get PDF
    Most saliency estimation methods aim to explicitly model low-level conspicuity cues such as edges or blobs and may additionally incorporate top-down cues using face or text detection. Data-driven methods for training saliency mod- els using eye-fixation data are increasingly popular, par- ticularly with the introduction of large-scale datasets and deep architectures. However, current methods in this lat- ter paradigm use loss functions designed for classification or regression tasks whereas saliency estimation is evalu- ated on topographical maps. In this work, we introduce a new saliency map model which formulates a map as a generalized Bernoulli distribution. We then train a deep ar- chitecture to predict such maps using novel loss functions which pair the softmax activation function with measures designed to compute distances between probability distri- butions. We show in extensive experiments the effective- ness of such loss functions over standard ones on four pub- lic benchmark datasets, and demonstrate improved perfor- mance over state-of-the-art saliency methods

    Vision-based simultaneous localization and mapping in changing outdoor environments

    Get PDF
    For robots operating in outdoor environments, a number of factors, including weather, time of day, rough terrain, high speeds, and hardware limitations, make performing vision-based simultaneous localization and mapping with current techniques infeasible due to factors such as image blur and/or underexposure, especially on smaller platforms and low-cost hardware. In this paper, we present novel visual place-recognition and odometry techniques that address the challenges posed by low lighting, perceptual change, and low-cost cameras. Our primary contribution is a novel two-step algorithm that combines fast low-resolution whole image matching with a higher-resolution patch-verification step, as well as image saliency methods that simultaneously improve performance and decrease computing time. The algorithms are demonstrated using consumer cameras mounted on a small vehicle in a mixed urban and vegetated environment and a car traversing highway and suburban streets, at different times of day and night and in various weather conditions. The algorithms achieve reliable mapping over the course of a day, both when incrementally incorporating new visual scenes from different times of day into an existing map, and when using a static map comprising visual scenes captured at only one point in time. Using the two-step place-recognition process, we demonstrate for the first time single-image, error-free place recognition at recall rates above 50% across a day-night dataset without prior training or utilization of image sequences. This place-recognition performance enables topologically correct mapping across day-night cycles

    Efficient visual coding and the predictability of eye movements on natural movies

    No full text
    We deal with the analysis of eye movements made on natural movies in free-viewing conditions. Saccades are detected and used to label two classes of movie patches as attended and non-attended. Machine learning tech-niques are then used to determine how well the two classes can be separated, i.e. how predictable saccade targets are. Although very simple saliency mea-sures are used and then averaged to obtain just one average value per scale, the two classes can be separated with an ROC score of around 0.7, which is higher than previously reported results. Moreover, predictability is analysed for different representations to obtain indirect evidence for the likelihood of a particular representation. It is shown that the predictability correlates with the local intrinsic dimension in a movie

    MRCNet: Crowd counting and density map estimation in aerial and ground imagery

    Get PDF
    In spite of the many advantages of aerial imagery for crowd monitoring and management at mass events, datasets of aerial images of crowds are still lacking in the field. As a remedy, in this work we introduce a novel crowd dataset, the DLR Aerial Crowd Dataset (DLR-ACD), which is composed of 33 large aerial images acquired from 16 flight campaigns over mass events with 226,291 persons annotated. To the best of our knowledge, DLR-ACD is the first aerial crowd dataset and will be released publicly. To tackle the problem of accurate crowd counting and density map estimation in aerial images of crowds, this work also proposes a new encoder-decoder convolutional neural network, the so-called Multi-Resolution Crowd Network (MRCNet). The encoder is based on the VGG-16 network and the decoder is composed of a set of bilinear upsampling and convolutional layers. Using two losses, one at an earlier level and another at the last level of the decoder, MRCNet estimates crowd counts and high-resolution crowd density maps as two different but interrelated tasks. In addition, MRCNet utilizes contextual and detailed local information by combining high- and low-level features through a number of lateral connections inspired by the Feature Pyramid Network (FPN) technique. We evaluated MRCNet on the proposed DLR-ACD dataset as well as on the ShanghaiTech dataset, a CCTV-based crowd counting benchmark. The results demonstrate that MRCNet outperforms the state-of-the-art crowd counting methods in estimating the crowd counts and density maps for both aerial and CCTV-based Images

    Aerial Road Segmentation in the Presence of Topological Label Noise

    Get PDF
    The availability of large-scale annotated datasets has enabled Fully-Convolutional Neural Networks to reach outstanding performance on road extraction in aerial images. However, high-quality pixel-level annotation is expensive to produce and even manually labeled data often contains topological errors. Trading off quality for quantity, many datasets rely on already available yet noisy labels, for example from OpenStreetMap. In this paper, we explore the training of custom U-Nets built with ResNet and DenseNet backbones using noise-aware losses that are robust towards label omission and registration noise. We perform an extensive evaluation of standard and noise-aware losses, including a novel Bootstrapped DICE-Coefficient loss, on two challenging road segmentation benchmarks. Our losses yield a consistent improvement in overall extraction quality and exhibit a strong capacity to cope with severe label noise. Our method generalizes well to two other fine-grained topology delineation tasks: surface crack detection for quality inspection and cell membrane extraction in electron microscopy imagery
    corecore